Skip to content

feat(compass-collection): Schema Analysis Redux Integration for Collection Plugin CLOUDP-333846 #7177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ncarbon
Copy link
Collaborator

@ncarbon ncarbon commented Aug 7, 2025

Description

This pull request introduces schema analysis functionality to the Compass Collection Tab, enabling the application to automatically analyze a collection's schema when it is loaded (unless the collection is read-only or time-series). The changes include new Redux state and actions for schema analysis, and a thunk to perform the analysis.

Motivation

Schema Analysis Feature:

  • Added a new SchemaAnalysis state to CollectionState, with status tracking (INITIAL, ANALYZING, COMPLETED, ERROR), schema data, sample document, schema metadata, and error information.
  • Implemented a thunk action analyzeCollectionSchema that samples documents, analyzes the schema, calculates metadata (like max nesting depth and validation rules), and updates the Redux state accordingly.
  • Updated the reducer to handle new schema analysis actions, updating the state based on progress and results.

Store Initialization and Integration:

  • Modified the store initialization to include the new schemaAnalysis state and inject required dependencies (logger, preferences, abort controller) for schema analysis.
  • On collection metadata load, the store now automatically dispatches schema analysis (unless the collection is read-only or time-series).

Testing Enhancements:

  • Updated and extended tests to mock and verify schema analysis behavior, ensuring it only runs for eligible collections and is skipped for read-only or time-series collections. [1] [2]

Checklist

  • New tests and/or benchmarks are included
  • Documentation is changed or added
  • If this change updates the UI, screenshots/videos are added and a design review is requested
  • I have signed the MongoDB Contributor License Agreement (https://www.mongodb.com/legal/contributor-agreement)

Motivation and Context

To extract schema information (types and validation rules) that will be used in the upcoming mock data generator feature.

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Open Questions

Dependents

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

@github-actions github-actions bot added the feat label Aug 7, 2025
@ncarbon ncarbon added no release notes Fix or feature not for release notes and removed feat labels Aug 7, 2025
@github-actions github-actions bot added the feat label Aug 7, 2025
@ncarbon ncarbon marked this pull request as ready for review August 7, 2025 21:51
@Copilot Copilot AI review requested due to automatic review settings August 7, 2025 21:51
@ncarbon ncarbon requested a review from a team as a code owner August 7, 2025 21:51
@ncarbon ncarbon requested review from gribnoysup and jcobis August 7, 2025 21:51
Copilot

This comment was marked as outdated.


interface SchemaAnalysisFinishedAction {
type: CollectionActions.SchemaAnalysisFinished;
schemaAnalysis: SchemaAnalysis;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actions shouldn't be controlling the state, reducer should, passing the whole state and just assigning it in the reducer is an antipattern that should be avoided, don't do this. Clearly separate what is the action payload and what is the state that reducer will derive based on the action

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this. Wondering if it's worth it to move this to its own reducer. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty self contained, so can be its own slice, yeah

@ncarbon ncarbon requested a review from gribnoysup August 11, 2025 15:59
@ncarbon
Copy link
Collaborator Author

ncarbon commented Aug 11, 2025

@jcobis @gribnoysup Heads up - Adding compass-schema as a dependency in compass-collection in order to use calculateSchemaMetadata and get the schema depth, but this introduced a cyclical dep.

In order to remove the cyclical dependency, I think we'd have to remove the compass-collection dependency from compass-schema and compass-schema-validation since they're both importing CollectionTabPluginMetadata.

Since we only need schema_depth from this function, I'm considering extracting/duplicating just part of this calculation in compass-collection.

},
A
>;

export enum SchemaAnalysisStatus {
Copy link

@kpamaran kpamaran Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] fyi there's another thread that advocates the compass repos's preference for unions over enums in #7181 (comment). I personally prefer them over enums because they enable type algebra like creating intersection/union types, and enums add runtime artifacts that union types do not

ERROR = 'error',
}

type SchemaAnalysis = {
Copy link

@kpamaran kpamaran Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| null in various fields is a sign that we can apply discriminated fields to get type narrowing benefits.

You can break this into a type for each Status's state

type SchemaAnalysis = SchemaAnalysisError | SchemaAnalysisInitial | SchemaAnalysisAnalyzing | SchemaAnalysisCompleted`

// example. notice there's no `error` field or `| null`
type SchemaAnalysisCompleted = {
   schema: Schema;
   status: SchemaAnalysisStatus;
   sampleDocument: Document;
   schemaMetaData: { ... };
}

// etc

each type shares the status but the status helps the type checker see what fields exist based on the status

The TypeScript handbook has an example here

This should simplify the action action+reducer code as well by removing entries for nullable fields

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful! I'm gonna move these types to their own file as well!

return;
}

try {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume it's possible to go to ANALYZING directly from COMPLETED or FAILED as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible. For now, this function will be invoked when collection metadata is fetched successfully.

sampleDocument: sampleDocuments[0] ?? null,
schemaMetadata,
});
} catch (err: any) {
Copy link

@kpamaran kpamaran Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there documented errors that can be caught here? Would help devs looking at stack traces narrow the source of the issue earlier, especially because there's multiple await calls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I don't think this is extensive, but I copied the error handling logic from schema-analysis-reducer to this file and added a SchemaAnalysis specific error type. See getErrorDetails. This would help with error handling in the UI as well.

}

let schemaMetadata = null;
if (schema !== null) {
Copy link

@kpamaran kpamaran Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if schema stays null? Seems like an edge case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I was assuming a null schema would get handled in the UI code once the analysis completed. If we don't have a schema, our schema analysis is incomplete.

Should we consider this a "failure" state? Something else? @jcobis Do you have thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the return type of schemaAccessor.getInternalSchema() is non-null Schema. This may be an unnecessary check.

Copy link
Collaborator

@jcobis jcobis Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it's an unnecessary check. We probably don't need to check if (schemaAccessor) either

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the analysis fails I think the exception would be thrown by getInternalSchema(), which would be caught by the surrounding try/catch block

@gribnoysup
Copy link
Collaborator

In order to remove the cyclical dependency, I think we'd have to remove the compass-collection dependency from compass-schema and compass-schema-validation since they're both importing CollectionTabPluginMetadata.

@ncarbon I don't think you can easily remove the dependency here, compass-collection is the place that keeps the source of truth for this type, you can't really move it somewhere else without creating an awkward intermediate place for it and I think I'd prefer we avoid doing that 🙂

I think if you want to just implement the depth counting somewhere near the code that does the check that's fine (in your case you don't even need to count the full depth, right? only that it doesn't hit a relatively small limit). You should also consider moving this code to mongodb-schema package that is already this one place where we accumulate all the shared code related to schema analysis.

Copy link
Collaborator

@jcobis jcobis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Nataly!

Comment on lines 207 to 215
const sampleCursor = dataService.sampleCursor(
namespace,
samplingOptions,
driverOptions,
{
fallbackReadPreference: 'secondaryPreferred',
}
);
const sampleDocuments: Array<Document> = await sampleCursor.toArray();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use .sample if you're immediately iterating the whole cursor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated!

@ncarbon ncarbon requested a review from kpamaran August 13, 2025 15:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces schema analysis functionality to the Compass Collection Tab, automatically analyzing a collection's schema when it loads (excluding read-only and time-series collections). The implementation includes new Redux state management for schema analysis with comprehensive status tracking and error handling.

  • Adds new Redux state management for schema analysis with status tracking (initial, analyzing, completed, error)
  • Implements automatic schema analysis on collection load with proper filtering for read-only and time-series collections
  • Introduces schema depth calculation utility for analyzing nested document structures

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
collection-tab.ts Adds schema analysis state initialization and triggers analysis on metadata load
collection-tab.spec.ts Updates tests to mock schema analysis and verify conditional execution
schema-analysis-types.ts Defines TypeScript types for schema analysis state management
collection-tab.ts (modules) Implements core schema analysis thunk action with document sampling and error handling
calculate-schema-depth.ts Utility function for calculating maximum nesting depth in schema structures
calculate-schema-depth.spec.ts Comprehensive test suite for schema depth calculation
package.json Adds required dependencies for schema analysis functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@kpamaran kpamaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diffs for my last feedback and additions around calculate-schema-depth are good for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat no release notes Fix or feature not for release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants